Human O-GlcNAcase Uses a Preactivated Boat-skew Substrate Conformation for Catalysis. Evidence from X-ray Crystallography and QM/MM Metadynamics

Human O-linked β-N-acetylglucosaminidase (hOGA) is one of the two enzymes involved in nuclear and cytoplasmic protein O-GlcNAcylation, an essential post-translational modification. The enzyme catalyzes the hydrolysis of the GlcNAc-O-(Ser/Thr) glycosidic bonds via anchimeric assistance through the 2-acetamido group of the GlcNAc sugar. However, the conformational itinerary of the GlcNAc ring during catalysis remains unclear. Here we report the crystal structure of wild type hOGA in complex with a nonhydrolyzable glycopeptide substrate and elucidate the full enzyme catalytic mechanism using QM/MM metadynamics. We show that the enzyme can bind the substrate in either a chair- or a boat-like conformation, but only the latter is catalytically competent, leading to the reaction products via 1,4B/1S3 → [4E]‡ → 4C1 and 4C1 → [4E]‡ → 1,4B/1S3 conformational itineraries for the first and second catalytic reaction steps, respectively. Our results reconcile previous experimental observations for human and bacterial OGA and will aid the development of more effective OGA inhibitors for diseases associated with impaired O-GlcNAcylation.


Synthesis of CKII-Cys-GlcNAc glycopeptides
The CKII-Cys-GlcNAc glycopeptides used in this study was generated chemoenzymatically using recombinantly-expressed O-GlcNAc Transferase (OGT).The unmodified peptide was obtained commercially (Biomatik Corporation, Cambridge, ON) and 1-2 mg oligopeptide at a final concentration of 1 mM was incubated with UDP-GlcNAc (5 mM), OGT (1 μM, prepared as previously described 1 ) and 2U shrimp alkaline phosphatase (New England Biolabs, Whitby, ON) in PBS at pH 7.2 containing 12.5 mM MgCl2.Reactions were incubated at 37 °C for 4 h or overnight.Prior to purification, the reactions were terminated by heating at 95 °C for 10 min and then centrifuged at 13,000 x g for 2 minutes.The supernatants were recovered and were purified on an Agilent 1200 series HPLC equipped with an Agilent XDB-C18 Eclipse reversed-phase column (9.4 × 250 mm, 5 μ particle size).Glycopeptides were eluted using a mobile phase consisting of H2O and CH3CN with 0.1% trifluoroacetic acid over a gradient of 10 to 50% CH3CN as appropriate.Fractions containing the product were lyophilized to yield up to 1 mg glycopeptide as a white powder.The glycopeptide was purified to >95% purity and analyzed by High Performance Liquid Chromatography-Electrospray Ionization Mass Spectrometry (LC-MS) in positive ion mode, as described below (Figure S1).Expected mass 1842.7 g/mol; observed: m/z 1843.73 [M+H] + , 922.71 [M+2H] 2+ .LC-MS was performed using a Dionex UltiMate® 3000 Ci Rapid Separation LC system equipped with an UltiMate® 3000 photodiode array detector probing at 250-400 nm, coupled to an HCT ultra ETD II (Bruker Daltonics) ion trap spectrometer, using Chromeleon® 6.80 SR12 software (ThermoScientific), esquireControl version 6.2, Build 62.24 software (Bruker Daltonics), and Bruker compass HyStar 3.2-SR2, HyStar version 3.2, Build 44 software (Bruker Daltonics).Mass spectrometry data analysis was performed using ESI Compass 1.3 DataAnalysis, version 4.4 software (Bruker Daltonics).Prior to analysis, the peptide was dissolved in LC-MS grade water at a concentration of 1.5 mM.A sample of this solution (5 µL) was chromatographically analysed using an Accucore C18 column (50 x 2.1 mm, particle size 2.6 µM, Thermo Fisher Scientific).For the mobile phase, a linear gradient of CH3CN in water (both containing 0.1% formic acid (v/v)) at a flow rate of 0.3 mL min −1 at RT was used.The gradient was started at 5% CH3CN and finished at 95% over 7 minutes, followed by a linear step of 95% CH3CN for 3 minutes.
Crystals were seeded using a x20 diluted stock and the wells were set up in a ratio of 1:0.2:0.8P:S:R.The complex between hOGA and the CKII-Cys-GlcNAc glycopeptide was obtained by soaking the glycopeptide into the crystals at a final concentration of 5 mM.The glycopeptidewas soaked into hOGA crystals at a final concentration of 5 mM for 5 days.After testing the crystals for diffraction in house, data was collected at Diamond using the I04-1 beamline.The diffraction images were integrated using the Xia2 pipeline 3 then further processed using the CCP4 software suite. 4Initially data reduction using Aimless [5][6] was conducted.The resulting structure was obtained after several cycles of refinement in REFMAC [7][8][9][10][11][12] and model building in COOT. 13CCP4mg was used to render the figures). 14Data collection and structure processing statistics are provided in Table S1.

System preparation:
The initial coordinates of the hOGA were taken from the X-ray structure reported here, adding the missing loops with the software Modeller. 15The truncated model described in Roth et al., 2 which comprises amino acids 11-396 (N-terminal fragment) and 535-715 (Cterminal fragment) and preserves the catalytic activity, was used.Since the enzyme only shows catalytic activity in the dimeric form, the two protein subunits were considered.
The coordinates of the Ser-GlcNAc substrate were also taken from the crystal structure reported here, substituting the sulphur atom of the Cys residue by an oxygen atom.The solvent-exposed chain of the glycopeptide, which was not solved in the X-ray structure, was reconstructed from the crystal structure of D175N hOGA in complex with p53-Ser-GlcNAc (PDB: 5UN8, chain G). 16 The software Chimera was used for the overlap of both structures. 17The protonation states of aspartate, glutamate and histidine residues were assigned using the software PropKa3, [18][19] as well as visual analysis of the environment of each residue, considering the optimal pH for the hOGA activity (≈ 6.5). 20The acid/base S4 residue (D175) was taken as protonated, whereas the assisting amino acid (D174) was taken as deprotonated.

Classical MD simulations:
The initial system was prepared using the LeaP code, included in the AmberTools21 package. 21The hOGA with the glycopeptide unit was solvated in a cubic box containing 57201 water molecules and 20 sodium ions, in order to neutralize the protein charge.The force fields FF14SB, 22 GLYCAM_06j-1 23 and TIP3P 24 were used for the protein, GlcNAc and water solvent molecules, respectively.MD simulations were performed in several steps.We started with an energy minimization, using the steepest descent and conjugate gradients methods, in three steps.First, we relaxed the solvent molecules, holding the protein and substrate fixed.Secondly, the protein was relaxed but keeping the two catalytic residues (D174 and D175) and the substrate fixed.Finally, the entire system was relaxed.The system was subsequently heated to 300 K by increasing the temperature in intervals of 50K over 50 ps runs.During the first run, the enzyme and the substrate were kept fixed, whereas only the backbone atoms were kept fixes in the subsequent heating runs.Afterwards, the water density was converged at 300 K by running a simulation in the NPT ensemble for 500 ps, keeping the restrains in the protein backbone and the interaction between the substrate and the catalytic residues (in particular, the distance between the carboxylate group of D174 and the NH group of the GlcNAc and between the proton of the carboxylic group of D175 and the glycosidic oxygen we kept at values ≤ 3 Å).The system was subsequently equilibrated in the NVT ensemble during 50 ns, preserving only the distance restraints.Finally, the system was simulated in the NVT ensemble with no restraints for 100 ns.Three replicas assigning different initial velocities were performed, and one of them was extended up to 1 µs (Figure S3).Two replicas of classical MD were also carried out using CKII-GlcNAc-Cys instead of the natural substrate, following the same protocol.In this case, the parameters of the GAFF [25] force-field were used for modelling the sugar link to the sulphur atom.All simulations were carried out with the AMBER20 software, 21 whereas analyses (Figures S3-S14) were carried out using VMD 25 and cpptraj. 26

QM/MM MD simulations:
Two snap-shots of the previous classical MD simulation, in which the acid/base residue interacts either with the glycosidic oxygen or with the assistant residue (D174), (MCA S5 and MCB, respectively) were selected.QM/MM MD simulations, combining DFT-based Born-Oppenheimer MD with force-field MD, were carried out starting from each of these two snap-shots of the classical MD trajectory, using the software CP2K v9.1 27 coupled to PLUMED v2.8. 28The QM region was selected as to include the Ser-GlcNAc unit and the side chain of the two catalytic residues (D174 and D175).Part of the side of chain of an active site lysine (K98) was also included in the QM region, resulting in a total of 55 atoms (Figure S5, left panel).The QM subsystem was enclosed in a 15.56 x 15.25 x 14.11 Å 3 cell.The remaining atoms were treated at the molecular mechanics (MM) level, whereas the dangling bonds between the QM and the MM region were capped with hydrogen atoms.The QM region was described at DFT level using the PBE functional, 29 in consistency with previous works describing GH reaction mechanisms and carbohydrate conformations, [30][31][32] using the dual basis set of Gaussians and plane-waves (GPW) formalism.The Gaussian triple-ζ valence polarized (TZV2P) basis set was used to expand the wave function, converging the electron density employing an auxiliary plane-wave basis set with a density cut-off of 300 Ry, along with GTH pseudopotentials. 33The structure was first optimized by simulated annealing, followed by an unbiased QM/MM MD simulation at 300 K in the NVT ensemble for 5 ps, using a time step of 0.5 fs.

QM/MM metadynamics simulations of the MCA / MCB conformational change:
QM/MM metadynamics simulations [34][35][36][37] were carried out using CP2K v9.1 coupled to PLUMED v2.8. 28Two collective variables were employed (Figure S11A).The first CV accounts for the interaction between D175 and the glycosidic oxygen as well as the loss of the hydrogen bond between D174 and D175.The second CV describes the torsion of the COOH group of D175.The time evolution of the CVs, together with the reconstructed FES, are shown in Figures S11B and S11C.Gaussian-like biasing potentials of height 1 kcal/mol and 0.1 and 0.05 c. v. u. width for CV1 and CV2, respectively, were added every 100 MD steps (50 fs).For increasing accuracy, the Gaussian height was decreased to 0.1 kcal/mol in the region near the TS.The simulation was stopped after one recrossing over the TS (2426 Gaussians deposited, corresponding to 121.3 ps).

Calculation of the GlcNAc conformational free energy landscape
The conformational free energy landscape (FEL) of the GlcNAc in the active site of hOGA was computed employing collective variables that are derived from the Cremer-

S6
Pople puckering coordinates. 38In particular, the Cartesian projection coordinates (see definition in reference 39 ) divided by the Cremer-Pople puckering amplitude (Q) were used: CV1 = qx/Q, CV2 = qy/Q and CV3 = qz/Q.Two QM/MM metadynamics simulations were performed, starting from MCA and MCB, respectively.In both cases, the QM region included the Ser-GlcNAc unit (32 atoms) enclosed in a 13.41 x 10.80 x 13.39 Å 3 cell.The same protocol than in previous metadynamics has been followed for the equilibration of the system, using a plane-waves cutoff of 290 Ry.Gaussian-shaped potentials of height 1.2 kcal/mol were added every 30 fs with a width of 0.035, 0.030, 0.020 c. v. u. for qx/Q, qy/Q and qz/Q, respectively.To increase accuracy, the height was decreased to 0.6 kcal/mol after 300 ps (10000 potentials added).The simulation was stopped after extending the simulation 50 ps more.Reweighting into the θ and ϕ Cremer-Pople puckering coordinates [28] was performed with PLUMED. 28

QM/MM metadynamics simulations of the cyclization step:
One snap-shot from the previous simulation, corresponding to MCA (i.e. with the acid/base residue D175 oriented towards the glycosidic oxygen and the sugar adopts a boat-type conformation, 1,4 B), was selected as initial structure for QM/MM metadynamics simulations [34][35][36][37] of the cyclization step.One collective variable, composed by four distances, was used to build the bias and reconstruct the free energy profile.To properly discriminate between reactants and products, the distances corresponding to bonds that are formed in the reactants state (MCA) were defined as positive, whereas those corresponding to bonds being formed in the products of the half-reaction (i.e. the oxazoline/oxazolinium-ion intermediate, INT) were defined as negative, leading to the following expression for the CV:   -

𝑑 -𝑑 . The first pair of distances (𝑑 -𝑑
) accounts for the protonation of the leaving group, whereas the other pair ( - ) describes the intramolecular nucleophilic attack.The time evolution of the CV is shown in Figure S7.Gaussian-like biasing potentials of height 1 kcal/mol and width 0.1 c. v. u. were added every 100 MD steps (50 fs).For a better accuracy, the Gaussian height was decreased (0.5 kcal/mol) upon crossing and recrossing over the TS.Following literature recommendations, 40 the simulation was stopped after the recrossing over the TS, resulting in the addition of a total number of 1500 Gaussian functions (75 ps).The structure of the

S7
TS was further confirmed by committor analysis.Analysis of the trajectory was carried out with PLUMED and home-made python scripts.

System preparation and classical MD simulations:
One snap-shot from the previous QM/MM metadynamics simulation, in which the oxazolinium ion was formed, was chosen as starting point for the simulation of the second step of the reaction mechanism.The leaving peptide chain was removed, and the system was re-solvated and neutralized in a cubic box containing 50394 water molecules and 18 sodium ions.The force fields FF14SB and TIP3P were used for the protein and water solvent molecules, respectively, whereas the oxazolinium ion was modelled using the GAFF [25] force-field, using the Antechamber tool [26] and RESP charges calculated with Gaussian09 [27] at HF/6-31G* level.A classical MD simulation, using the same protocol than in the previous section, was carried out to relax the system (Figure S8).During the MD, one water molecule was held in the catalytic centre for the first 50 ns by restraining the distances with the C1 of the oxazolinium ion and the carboxylate group of D174 at a value ≤ 3 Å.An unrestrained MD was then carried out for 50 ns more.We computed the probability distribution of the water molecules in the catalytic centre by counting the number of waters in a sphere of radius of 2 Å located in the centre of the C1 of the oxazolinium ion and the two oxygen atoms of the carboxylate of D174.

QM/MM MD simulations:
A frame of the previous unrestrained trajectory with a water molecule displaying distances to the substrate and D174 suitable for the catalysis was selected as the initial The same protocol as in the first catalytic step was adopted, using a cut-off of 350 Ry for the convergence of the electron density.The final structure of the unbiased QM/MM MD calculation was used to initiate the metadynamics simulations.

Simulation data
Coordinate files and other simulation data can be found in the Zenodo repository (https://zenodo.org)at DOI: 10.5281/zenodo.7828194.
structure for the following QM/MM calculations.The QM region was composed by 53 atoms (Figure S5, right panel): the oxazolinium ion, the side chain of the two assisting and acid/base residues (D174 and D175) and the neighbouring Lys (K98), as well as the catalytic water molecule.The QM region was enclosed in a 16.19 x 13.42 x 15.01 Å 3 cell.

S8 4 . 3 .
QM/MM metadynamics simulations of the ring opening:As in the first reaction step, one collective variable defined by four distances was used to build the bias potential and reconstruct the free energy profile: [-  - ].Two of the distances describe the deprotonation of the water by the D175 acid/base residue ( - ), whereas the other two take into account the attack of the water oxygen to the C1 of the oxazolinium ion ( - ).Gaussian-like biasing potentials of height 1 kcal/mol and width 0.1 c. v. u. were added every 100 MD steps (50 fs), decreasing the Gaussian height to 0.5 kcal/mol during crossing and recrossing over the TS for improving accuracy.The simulation was stopped after one recrossing over the TS, (66.5 ps of simulation, 1330 potentials added).The structure of the TS was further confirmed by committor analysis.The time evolution of the CV is shown in Figure S9.

Figure S2. A )Fig S4 .Fig S6 .
Figure S2.A) Superposition between the CKII-Cys-GlcNAc peptide, described in this work, and p53-GlcNAc peptide (PDB ID: 5UN8) bound to D175N hOGA.Both glycopeptides bind in a similar way and are shown in shades of green and blue, respectively.B) Stabilizing interactions between GlcNAc and hOGA observed in the Xray structure.

Fig S7 .S17Fig S9 .
Fig S7.Time evolution of the CV employed in the QM/MM metadynamics of the cyclization step.

Fig S12 .Fig S14 .
Fig S12.Time evolution of the H-NAc and H-D174 distance in an unbiased QM/MM simulation starting from INT.